{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "n461or5eb8du"
   },
   "source": [
    "# Autoencoders\n",
    "\n",
    "In this notebook, we'll introduce and explore \"autoencoders,\" which are a very successful family of models in modern deep learning. In particular we will:\n",
    "\n",
    "\n",
    "1.   Illustrate the connection between autoencoders and classical *Principal Component Analysis (PCA)*\n",
    "3.   Train a non-linear auto-encoder that uses a deep neural network\n",
    "\n",
    "### Overview\n",
    "As explained in the text, autoencoders are a way of discovering *latent, low-dimensional structure* in a dataset. In particular, a random data vector $X \\in \\mathbb{R}^d$ can be said to have low-dimensional structure if we can find some functions $f: \\mathbb{R}^d \\to \\mathbb{R}^k$ and $g: \\mathbb{R}^k \\to \\mathbb{R}^d$, with $k \\ll d$, such that $$g(f(X)) \\approx X.$$\n",
    "In other words, $f(X)$ is a parsimonious, $k$-dimensional representation of $X$ that contains all of the information necessary to approximately reconstruct the full vector $X$. Traditionally, $f(X)$ is called an *encoding* of $X$.\n",
    "\n",
    "It turns out that this is meaningless unless we restrict what kinds of functions $f$ and $g$ are allowed to be, because it's possible to write down some (completely ugly) one-to-one function $\\mathbb{R}^d \\to \\mathbb{R}^1$ for any $d$. This gives rise to the notion of *autoencoders* where, given some sets of reasonable functions $F$ and $G$, we aim to minimize\n",
    "$$\\mathbb{E}[\\mathrm{loss}(X, f(g(X))]$$ over functions $f \\in F$ and $g \\in G$. As usual, this is done by minimizing the sample analog.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "5XzJoBaIvksv"
   },
   "source": [
    "## Linear Autoencoders and PCA: Practice\n",
    "\n",
    "It turns out that linear autoencoders are the same as PCA. Let's do a small sanity check to verify this. In particular, let's perform PCA two ways: first using a standard (linear algebra) toolkit, and second as a linear autoencoder using a neural network library.\n",
    "If all goes well, they should give you the same reconstructions!\n",
    "\n",
    "To make it a bit more fun, we will use the [*Labeled Faces in the Wild*](https://www.kaggle.com/jessicali9530/celeba-dataset) dataset which consists of standardized images of roughly 5,000 celebrities' faces. In this data, PCA amounts to looking for a small number of \"proto-faces\" such that a linear combination of them can accurately reconstruct any celebrity's face."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "DQfDiQ8aqpyE"
   },
   "outputs": [],
   "source": [
    "# First, let's download and inspect the data!\n",
    "from sklearn.datasets import fetch_lfw_people\n",
    "faces = fetch_lfw_people()\n",
    "\n",
    "# 3D Array \"faces.images\" contains images as 2d arrays, stacked along dimension 0\n",
    "n_examples, height, width = faces.images.shape\n",
    "\n",
    "# 2D Array \"design_matrix\" encodes each image as a 1d numeric row, as is conventional in statistics\n",
    "design_matrix = faces.images.reshape((n_examples, -1))\n",
    "\n",
    "n_features = design_matrix.shape[1]\n",
    "\n",
    "print(\n",
    "    \"Labeled Faces in the Wild Dataset: \\n\\\n",
    "    Number of examples: {}\\n\\\n",
    "    Number of features: {}\\n\\\n",
    "    Image height: {}\\n\\\n",
    "    Image width: {}\".format(\n",
    "        n_examples,\n",
    "        n_features,   # per image\n",
    "        height,\n",
    "        width))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "PX_E23v-5yZY"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Let's gather all the images corresponding to Arnold Scwarzenegger to use as examples\n",
    "\n",
    "# Make a list (of length one!) of labels corresponding to Arnold\n",
    "# Array \"faces.target_names\" tells us which numeric label (index) corresponds to which person name (value)\n",
    "arnold_labels = np.where(faces.target_names == 'Arnold Schwarzenegger')\n",
    "\n",
    "# Get indices of all images corresponding to this label\n",
    "# Array \"faces.target\" tells us which image (index) corresponds to which numeric image labels (value)\n",
    "arnold_pics = np.where(np.isin(faces.target, arnold_labels))[0]\n",
    "\n",
    "\n",
    "# Make a helper function so we can take a look at our target images\n",
    "def plot_faces(images, n_row=2, n_col=3):\n",
    "    \"\"\"Helper function to plot a gallery of portraits\"\"\"\n",
    "    plt.figure(figsize=(1.5 * n_col, 2.2 * n_row))\n",
    "    plt.subplots_adjust(0.6, 0.5, 1.5, 1.5)\n",
    "    for i in range(n_row * n_col):\n",
    "        plt.subplot(n_row, n_col, i + 1)\n",
    "        plt.imshow(images[i].reshape((height, width)), cmap=plt.cm.gray)\n",
    "        plt.xticks(())\n",
    "        plt.yticks(())\n",
    "    plt.tight_layout()\n",
    "    plt.show()\n",
    "\n",
    "\n",
    "# Let's try it out!\n",
    "plot_faces(\n",
    "    faces.images[arnold_pics[:6], :, :]  # first six images of Arnold appearing in the dataset\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Q74vWlBxyIgA"
   },
   "outputs": [],
   "source": [
    "# 1. Find the first 32 principal components of the dataset using the Scikit-learn library\n",
    "# For extra fun, you can do so directly using the singular value decomposition (your mileage may vary!)\n",
    "\n",
    "# We'll use a standard library, which uses linear algebra to compute the principal components.\n",
    "from sklearn.decomposition import PCA\n",
    "\n",
    "# There's no need to de-mean the data. Can you explain why?\n",
    "pca = PCA(n_components=128, svd_solver='randomized').fit(design_matrix)\n",
    "# out of 2914 eigenvectors, we pick the 128 associated to the biggest eigenvalues"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "sLi4k8t3DrHe"
   },
   "outputs": [],
   "source": [
    "# 2. Plot the first 6 \"eigenfaces,\" the six images whose linear span best explains the variation in our dataset\n",
    "eigenfaces = pca.components_\n",
    "plot_faces(eigenfaces[:6])\n",
    "# we check the first six eigenvectors/projection axes, reshaped\n",
    "# (the eigenvectors that captured the highest variation in our dataset of images)\n",
    "# here, eigenvector1 orthog to eigenvector2 and all the others => decorrelation\n",
    "# (there's no way to reconstruct eigenvector1 using a linear combination of all the other eigenvectors)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Gmj2lpTfCXKC"
   },
   "outputs": [],
   "source": [
    "# 3. Plot Arnold's face (any image will do!) reconstructed using 1, 2, 8, 32, 64, 128 principal components\n",
    "face_vector = design_matrix[arnold_pics[1]]\n",
    "\n",
    "\n",
    "def reconstruct(image_vector, n_components):\n",
    "    return eigenfaces[:n_components].T @ (eigenfaces[:n_components] @ image_vector.reshape((-1, 1)))\n",
    "\n",
    "\n",
    "reconstructions = [reconstruct(face_vector, k) for k in [1, 2, 8, 32, 64, 128]]\n",
    "plot_faces(reconstructions, n_row=2, n_col=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "eoZ_BsXYDE7P"
   },
   "outputs": [],
   "source": [
    "# 4. Train linear autoencoder with 64 neurons using Keras\n",
    "# 5. Compare reconstructions of Arnold's face both using MSE and visually"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "urlMaifVJCDc"
   },
   "outputs": [],
   "source": [
    "from tensorflow.keras import Model, Input\n",
    "from tensorflow.keras.layers import Dense\n",
    "\n",
    "encoding_dimension = 64\n",
    "input_image = Input(shape=(n_features,))\n",
    "encoded = Dense(encoding_dimension, activation='linear')(input_image)\n",
    "decoded = Dense(n_features, activation='linear')(encoded)\n",
    "\n",
    "autoencoder = Model(input_image, decoded)\n",
    "\n",
    "autoencoder.compile(optimizer='adam', loss='mse')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "5OTUbWg8NcIE"
   },
   "outputs": [],
   "source": [
    "autoencoder.fit(design_matrix, design_matrix,\n",
    "                epochs=50,\n",
    "                batch_size=256,\n",
    "                shuffle=True,\n",
    "                verbose=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "omt7zwGfOOQF"
   },
   "outputs": [],
   "source": [
    "# Compute neural reconstruction\n",
    "reconstruction = autoencoder.predict(face_vector.reshape(1, -1))\n",
    "\n",
    "# Do visual comparison\n",
    "plot_faces([reconstructions[4], reconstruction], n_row=1, n_col=2)\n",
    "\n",
    "# Do numeric comparison\n",
    "# We also normalize the black/white gradient to take values in [0,1] (divide by 255)\n",
    "np.mean(np.power((reconstructions[4].T - reconstruction) / 255, 2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zrdD6A55yJML"
   },
   "source": [
    "## Neural Autoencoders\n",
    "\n",
    "Finally, let's train a nonlinear autoencoder for the same data where $F$ and $G$ are neural networks, and we restrict the dimension to be $k=64$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "DSN-UliDzXyg"
   },
   "outputs": [],
   "source": [
    "# Use a nonlinear neural network\n",
    "from tensorflow.keras import Model, Input\n",
    "from tensorflow.keras.layers import Dense\n",
    "\n",
    "input_image = Input(shape=(n_features,))\n",
    "encoded = Dense(64, activation='relu')(input_image)\n",
    "encoded = Dense(64, activation='relu')(encoded)\n",
    "decoded = Dense(64, activation='relu')(encoded)\n",
    "decoded = Dense(n_features, activation='relu')(decoded)\n",
    "\n",
    "autoencoder = Model(input_image, decoded)\n",
    "\n",
    "autoencoder.compile(optimizer='adam', loss='mse')\n",
    "\n",
    "autoencoder.fit(design_matrix, design_matrix,\n",
    "                epochs=50,\n",
    "                batch_size=256,\n",
    "                shuffle=True,\n",
    "                verbose=0)\n",
    "\n",
    "# Compute neural reconstruction\n",
    "reconstruction = autoencoder.predict(face_vector.reshape(1, -1))\n",
    "\n",
    "# Do visual comparison\n",
    "plot_faces([reconstructions[4], reconstruction], n_row=1, n_col=2)\n",
    "\n",
    "# Do numeric comparison\n",
    "# We also normalize the black/white gradient to take values in [0,1] (divide by 255)\n",
    "np.mean(np.power((reconstructions[4].T - reconstruction) / 255, 2))"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
